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Abstract 

We give a simple probabilistic description of a transition between two states which 
leads to a generalized escort distribution. When the parameter of the distribution 
varies, it defines a parametric curve that we call an escort-path. The Renyi divergence 
appears as a natural by-product of the setting. We study the dynamics of the Fisher 
information on this path, and show in particular that the thermodynamic divergence 
is proportional to Jeffreys' divergence. Next, we consider the problem of inferring 
a distribution on the escort-path, subject to generalized moments constraints. We 
show that our setting naturally induces a rationale for the minimization of the Renyi 
information divergence. Then, we derive the optimum distribution as a generalized 
(/-Gaussian distribution. 

Keywords: Divergence measures, Generalized Renyi and Tsallis entropies, Escort 
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1. Introduction 

In this paper, we give a simple probabilistic description of a transition between 
two states, which leads to a parametric curve in the form of a generalized escort 
distribution. We call escort-path this parametric curve. In this setting, we show 
that the Renyi information divergence emerges naturally as a characterization of the 
transition. Along this escort-path, we study the Fisher information. In particular, 
we show that the thermodynamic divergence on the escort-path is proportional to 
Jeffreys' divergence. Finally, we consider the inference of a distribution subject to 
moments computed with respect to the escort distribution. First, we show that our 
setting leads to a rationale for the minimization of the Renyi information divergence. 
Then, we derive the optimum distribution as a generalized Gaussian distribution. 
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Before going into the details of the results, we shall present the context and in- 
troduce the main definitions on our main ingredients, that is the escort distributions, 
information divergences, and Fisher information. 

Throughout the paper, we will work with univariate probability densities defined 
with respect to a general measure (J,(x) on a set X. For instance, the Shannon- 
Boltzmann entropy will be expressed as 



As particular cases, we have that if X is the real line and [x the Lebesgue measure, 
then the expression above corresponds to the differential entropy. When the set X 
is N or a subset of N and jit the counting measure, then the expression reduces to 
the standard discrete entropy. When [i is a probability measure, then the expression 
([TJ can also be seen as the relative entropy from the measure with density / to the 
measure \i. 

Let us now turn to the notion of escort distribution. If f(x) is an univariate 
probability density with respect to n(x), then we define its escort distribution of 
order q, q > 0, by 



introduced as an operational tool in the context of multifractals p], [2], with inter- 
esting connections with the standard thermodynamics. Discussion of their geometric 
properties can be found in 011] . Escort distributions also prove to be useful in source 
coding where they enable to derive optimum codewords with a length bounded by 
the Renyi entropy [5]. 

The results presented in this paper are connected to the nonextensive statistical 
physics introduced by Tsallis, see e.g. [6]. Indeed, the nonextensive statistical physics 
uses a generalized entropy, makes use of escort distributions and exhibit generalized 
Gaussians. All these elements will pop up in our construction, which, therefore could 
lead to new viewpoints or interpretations in this context. It is particularly remarkable 
that the derivation of the maximum Tsallis entropy distributions in nonextensive 
thermostatistics requires a constraint in the form of an "escort mean value", that is 
computed with respect to an escort distribution like ^ [7J [5] . 

One can immediately extend the notion of escort distribution to deal with two 
probability densities f{x) and g(x) as follows. 

Definition 1. Let / and g be two densities with respect to a common measure //, 
with g dominated by /. For q > such that M q [f,g] = f f(x) q g(x) 1 ^ q dfj 1 (x) < oo, 
we call generalized escort distribution the function 







(3) 
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We will also denote, when non ambigous, by E q [.\ the statistical expectation with 
respect to the generalized escort distribution with index q. 

This generalized escort distribution is simply a weighted geometric mean of f(x) 
and g(x), and reduces to f q (x) = f(x) for q = 1 and to f q {x) = g(x) for q = 0. 
Obviously if g{x) is a uniform density whose support includes the support of f(x), 
then the generalized escort distribution gives back the standard one Actually 
the generalized escort ^ appeared in Chernoff analysis of the efficiency of hypoth- 
esis tests [9], and enables to define the best achievable exponent in the bayesian 
probability of error |10[ Chapter 11]. As q varies, the generalized escort distribution 
defines a curve that connects f(x) to g(x) and further. In the general framework of 
information geometry |11| . the generalized escort distribution ([3j coincides with the 
geodesic joining / and g in the case of an exponential connection. Such interpretation 
also appeared in a work by Campbell |12j . 

Throughout the paper, we will focus on the generalized escort distribution and 
the path it defines, that we will call the escort-path. 

Distances between probability distributions will be measured by means of infor- 
mation divergences. We will use the Kullback-Leibler directed information divergence 
which is defined as follows. 

Definition 2. Let / and g be two univariate densities with respect to a common 
measure fi, with / absolutely continuous with respect to g. The Kullback-Leibler 
directed information divergence is given by 

D(f\\g)= [ /(x)bg^d M (x). (4) 

It is understood, as usual, that OlogO = OlogO/a = log 0/0 = 0. Note that if 
we take g{x) = 1 in the expression above, then we obtain minus the Shannon entropy 
H[f]. Let us also recall that the minimization of the Kullback-Leibler divergence is 
a well established inference method, analog to Jaynes' maximum entropy approach 
and which is supported in particular by large deviation results |13j . We will also 
make use of the Renyi information divergence introduced in |14j . 

Definition 3. Let / and g be two probability densities with respect to a measure //. 
If / is absolutely continuous with respect to g, then, for q > such that M q [f,g] = 
f f \ x ) q g(x) 1 ~ q d^(x) < oo, the Renyi divergence is defined by 

D q (f\\g) = ^rlog| f(xYg{x) l -^{x). (5) 

Let us recall that the divergence is always non negative D q {f\\g) > with the 
equality sign iff / = g. By L'Hopital's rule, the Kullback divergence is recovered in 
the limit q — >■ 1. Taking g{x) = 1 in the expression of the Renyi divergence yields 
the negative of the Renyi entropy, noted H q [/] . 

We will study Fisher information along the escort-path. Indeed, it is well known 
that the Fisher information metric is a Riemannian metric that can be defined on a 
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smooth statistical manifold |15| I16| . Furthermore, the Fisher information serves as 
a measure of the information about a parameter in a distribution. It has intricate 
relationships with maximum likelihood and has many implications in estimation 
theory, as exemplified by the Cramer-Rao bound which provides a fundamental lower 
bound on the variance of an estimator |T7]- It is also used as a method of inference 
and understanding in statistical physics and biology, as promoted by Frieden |18U19j , 

Definition 4. Let f(x; 9) denote a probability density with respect to a measure (i, 
where 9 is a real parameter, and suppose that f{x; 9) is differentiable with respect 
to 9. Then, the Fisher information in the density / about the parameter 9 is defined 
as 

nm =/pf^) 2 /(*;«)<vm. (6) 

The remaining of the paper is structured as follows. In section [2] we show that 
the generalized escort presented above arises naturally in a simple probabilistic de- 
scription of a transition between two states. Interestingly, the Renyi information 
divergence, and in a particular case the Renyi entropy, emerges as a characterization 
of the transition. 

In section [3j we study the Fisher information, with respect to q, along the escort- 
path. We show in particular that the integral of the Fisher information along the 
path, the thermodynamic divergence, is proportional to Jeffreys' divergence. 

In section [1] we consider the problem of inferring the distribution f(x) in ^ 
or ^ on the escort-path when the only available information is given as a mean 
value. This mean value is the statistical expectation taken with respect to an escort 
distribution: this is the escort mean value used in nonextensive statistics. Different 
possible approaches, such as minimizing the directed divergence, or Jeffreys diver- 
gence or the thermodynamic divergence, reduce to the minimization of the Renyi 
information divergence. In this case, the probability distribution that emerges is a 
generalized Gaussian distribution, which is particularly important in applications. 

2. The escort-path 

It has been observed that Tsallis' extended thermodynamics seems particularly 
appropriate in the case of deviations from the classical Boltzmann-Gibbs equilibrium. 
This suggests that the original MaxEnt formulation "find the closest distribution to a 
reference under a mean constraint" may be amended by introducing a new constraint 
that displaces the equilibrium. The partial or displaced equilibrium can be imagined 
as an equilibrium characterized by two distributions, say po(x) and p\{x). Instead of 
selecting the nearest distribution to a reference under a mean constraint, we may look 
for a distribution p q (x) simultaneously close, in some sense, to two distinct references: 
such a distribution will be localized somewhere 'between' po(x) and pi(x). 
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pi 




(a) Case 77 < D(p 1 \\p ) (b) Case rj > £>(pi||po) 

Figure 1: Constrained equilibrium between states po and p\\ the equilibrium dis- 
tribution is sought in the set of all distributions such that D{p\[po) = rj, and with 
minimum Kullback distance to p\. The equilibrium distribution p q , the generalized 
escort distribution, is "aligned" with pq and p\ and intersects the set D(p\[po) = r/. 



2.1. Displaced equilibrium 

We consider two equilibrium states with respective probability densities po(x) and 
pi(x) with respect to a common measure at some point x in the phase space, and 
we look at intermediate states defined by the following scenario. The system with 
initial state po, subject to a generalized force, is moved at a distance r] = D(p\\po) 
frompo) where D{p\\po) is the Kullback-Leibler divergence (or relative entropy) from 
p to pq. Then, the system is attracted toward the final state p\. Therefore, the 
new intermediate equilibrium state, say p q , is chosen as the one which minimizes 
its divergence to the attractor p\ while being hold on at the distance r] from p$. As 
illustrated in Figure[T] the intermediate probability density is located on the "straight 
line" pq — p\ and intersects the circle with radius r] centered at pq. More precisely, 
the problem can be written as follows: 

minp D(p\\pi) 
s.t. D(p\\p ) = r } (7) 
and J p(x)dp(x) = 1 

where "s.t." stands for "subject to", and where the Kullback-Leibler divergence 
D(f\\g) is defined by The solution is given by the following Theorem. 

Theorem 5. Let p\ a probability density function with respect to p, and pq a non 
negative function. Assume that pi is absolutely continuous with respect to pq. Let p q 
denote the generalized escort distribution with index q > 

p 1 (x) q p (x) 1 ~ q . . 

= 7 — 7 \a ( W-aA ( ^ ' ( 8 ) 
J p\\x) q pQ\xy < Ja/j 1 {x) 
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with M q (pi,po) = J p\(x) q po(x) 1 q dfj>(x) < oo. If E q log ^ is finite, where E q [.] 
denote the statistical expectation with respect to p q , and if q is chosen such that 
D(p q \\po) = 77, then the generalized escort distribution |Sp is the unique solution of 
problem |?]). 

Proof. Let us evaluate the divergence D(p\ \p q ). For all densities p satisfying D(p\ \po) = 
rj, we have 

D(p\\ Pq ) = fp(x) log ^\dn(x) = (p{x)\og P \ X \l P{ f\Z M x ) +logM g (p 1 ,p ) (9) 

= q fp(x)log^-d^x) + (l-q) fp( x ) log ^-d^(x)+ log M q ( Pl , Po ) (10) 
J Pi(x) J Po{x) 

qDipWp^ + il-q^ + logMgip^po) (11) 



Observe that D(p q \\p ) = qE q log ^ -logM g and that D(p q \\ Pl ) = (l-q)E q 



log M q so that both divergences exist. Therefore, taking p = p q , the last equality 
gives 

D(p q \\p q )=qD(p q \\p 1 ) + (l-q)rj + log M q ( Pl ,p ). (12) 



Finally, subtracting (11) and &12h yields 



D(p\\p q ) - D(p q \\p q ) = q (D(p\\ Pl ) - D(p q \\ Pl )) . 

Since q > and since D(p\\p q ) > with equality iff p = p q , we obtain that D(p\\pi) > 
D(p q \\px) which proves the Theorem. □ 

It is interesting to note that Q is nothing else but a generalized version of 
the escort or zooming distribution of nonextensive thermostatistics, and that the 
corresponding statistical expectations are the so-called escort-means or generalized 
averages. Obviously, one recovers a standard escort distribution like ([TJ when po(x) 
is uniform with respect to fi. This is immediate if [x has a compact support. However, 
if one wants to use a uniform measure on the whole real axis, with \x the Lebesgue 
measure, then such a measure is no more a probability density since it integrates to 
infinity. In such case, it still possible to modify the formulation to include this case 
as well. Indeed, with po(x) = 1, the expression of the Kullback-Leibler divergence 
D(p\\po) becomes nothing but minus the standard entropy 



H[p] = — J p(x) logp(x)d/i(x) 



Therefore, the problem turns into the research of a distribution with a given entropy, 
which minimizes the divergence to p\\ 

min p D{p\\p 1 ) 

s.t. H[p] = -rj (13) 
and f p(x)dfi(x) = 1. 
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This setting can be illustrated as was done in Figure[T] excepted that the circle now 
corresponds to the set of distributions with a given level of entropy. Observe that 
neither the Theorem [5] nor its proof require that po is a probability density. Therefore 



we can take po(x) = 1 and obtain the solution of (13) as a simple corollary. 



Corollary 6. Let p q denote the escort distribution with index q, associated with pi, 
defined by 

J pi(x) q afi(x) 

provided that M q (pi) = f pi(x) g dfj>(x) < oo. If E q [log pi] is finite, where E q [.] 
denote the statistical expectation with respect to p q , and if q is chosen such that 



H[p q ] = —rj, then the escort distribution (14) is the unique solution of problem (13). 



When q varies, the function rj(q) = D(p q \\po) is monotonically increasing, and 
particular intermediate values satisfy the implicit relationship D(p q \\pQ) = r/. This 
property will be proved in section [3] corollary [TUJ as a simple consequence of a 
result on Fisher information. For q = we have rj = and for q = 1, we have 
r] = D(pi\\po). Accordingly, as q varies, p q traces out a curve, the escort-path, that 
connects po (q = 0) and p\ (q = 1). In the case q > 1, we have r] > D(pi\\po) as 
shown in Figure fTb] 

Interestingly enough, recent results have shown that the average dissipated work 
during a transition can be expressed as a relative entropy |20| I21j. Along these lines, 
with an Hamiltonian even in the momenta, the minimization of D{p\\p\) may be 
understood as a minimization of the average dissipated work for a transition from p 
to p 1 . 

2.2. Renyi and Jeffreys' divergences as by-products 

It is interesting to outline that the Renyi divergence and entropy arise as a by- 
product of our construction. Indeed, the minimum of the Kullback-Leibler divergence 
can be expressed as follows. 

Corollary 7. The minimum divergence is given by 

D(p q \\p 1 )=(l--)(r ! -D q (p 1 \\p )) (15) 



where D q (pi\\po) is the Renyi information divergence with index q, from p\ to pq. 
Proof. By direct calculation from the expression of the solution p q (x), or by a direct 



consequence of relation (12). □ 



If po is a uniform distribution, then — D q (pi\\po) = H q (pi), the Renyi entropy, p q 



is the standard escort distribution and (15) becomes 



D(/vII/>l)- (l-i) (// + //„(/'!)). 
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Although it is convenient to think of the Kullback-Leibler divergence D(f\\g) Q 
as a distance between / and g, it is not symmetric and does not satisfy the triangle 
inequality. Kullback and Leibler themselves introduced a symmetrized version, which 
was also considered before by Jeffreys. This Jeffreys' divergence appears here to be 
a simple affine function of Renyi information divergence D q (j>i\\po). 

Corollary 8. The Jeffreys divergence between p\ and the generalized escort distri- 
bution p q is given by 

J(pi,p q ) = D(px\\p q ) + D( Pq \\ Pl ) = (D q ( Pl \\p ) - r,) . (16) 



Proof. This is a simple consequence of (11), which gives D{pi\\p q ) = (1 — q)rj + 



logM g (pi,p ) if P = Pl, and of (12) that gives D(p q \\p 1 ) = (1- ±)rj- ± logM q (pi,po). 

□ 

As an interesting consequence, we see that if one wants to minimize the symmetric 
divergence between p\ and p q , subject to additional constraints, then this simply 
amounts to the minimization of the Renyi information divergence with the same 
constraints. When pq is uniform, this becomes the maximization of the Renyi entropy, 
or equivalently of the Tsallis entropy. It is thus interesting that our setting induces 
both an escort distribution and a Renyi divergence (or entropy), and besides with a 
common index q. Actually, although these two quantities are essential ingredients in 
nonextensive statistical mechanics, their relationships are discussed, e.g. [22J. 



3. Fisher information along the escort-path 

Suppose now that po(x) and p\(x) depend on a parameter 9. The Fisher infor- 
mation metric is based on the Fisher information matrix on a vector parameter 9 
attached to a density p(x; 9). This Fisher information matrix has entries 

= J PM) (^logj>M)) ( — logpM)) d/iOr). 

The derivative of the logarithm of the density with respect to the parameter is 
called the score function. The mean of the score function is zero, so that the Fisher 
information matrix is the covariance of the score function. 

The length of a curve parametrized by t, from to T, is given by 

In the context of thermodynamics, this quantity is called the thermodynamic length 
|2~H 125] . A related quantity is the thermodynamic divergence, or energy of the 
curve, given by 
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By Jensen's inequality, we have immediately that J > C 2 . An interesting point, 
that outlines the importance of these quantities, is the fact that the thermodynamic 
divergence asymptotically bounds the dissipation induced by a finite time transfor- 
mation of a thermodynamic system |26| I24| . Hence, it is interesting here to study 
some characteristics of the Fisher information along the escort-path. The general 
study of the Fisher information on the escort-path with respect to a general param- 
eter 6 is interesting in its own right. However, in order to save space, we will focus 
here on a special case. Let us still simply mention that when po is uniform, the re- 
lated Fisher information is the escort-Fisher information which has been considered 
in (271 ES E9]. 

As we have seen, the generalized escort distribution describes a geometric path, 
the escort-path, connecting distributions po and p\ for the values q = and q = 1. 
Clearly, the densities on the escort-path are characterized by the index q. Hence it is 
quite natural to evaluate the distance between two densities on the path, as well as 
the Fisher information with respect to q. Let us begin by a general expression of the 
Fisher information on the path. Then, we will be able to link this Fisher information 
to information divergences on the path. 

Theorem 9. Letp q be the generalized escort distribution as in |^]). Then, the Fisher 
information with respect to q of the generalized escort distribution is given by 



HQ) 



1 



Pq{x) 



dPq(x) 

dq 



dp(x) 



dpg(x) px(x) 

—r log — -— dfj,(x) 

dq po{x) 



(17) 



provided that E r 



log 



is finite for r in a compact neighborhood of 



The 



Fisher information with respect to q can also be written as the variance of the log- 
likelihood ratio: 



log 



p (x) 



log 



Pi(x) 
po(x) 



(18) 



Proof. The second order moment condition on the log-likelihood ratio implies, by 



Jensen inequality, that both E q 
consider M q (pi,po 



log 21 

6 Po 



and E n 



log ^ 

5 PO 



are finite. Let us first 



f Pi{x) ci pq{x) 1 q d[i(x). The integrand is clearly differentiable 
with respect to q, and this derivative, which is equal to p q log j^- is continuous and is 



absolutely integrable since E„ 



log 21 

& PO 



is finite. Furthermore, by the second order 
moment hypothesis, the last expression is also locally integrable with respect to q. 
This enables to use Leibniz' rule and differentiate under the integral sign, which gives 



dlogMg 



Pq(x)log -—dfl(x) 



dq J ~ - ' po{x 

Then, by direct calculation, we also have 



log 



Pi{x) 
Po(x) 



dpq(x) 

dq 



Pq{x) log 



, Pi 0*0 

' p (x) 



E r , 



log 



Pl{X) 

Po(x) 



(19) 



(20) 
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which, inserted in the definition of the Fisher information in (17) gives (18). 
By (20), we have that 



dp q {x) 



dq 



dfi(x) < Eg 



log 



Pi 
Po 



+ 



E n 



log 



Pi 

Po 



< oo. 



(21) 



Moreover, by the second order moment hypothesis, (21 ) is also locally integrable with 
respect to q. Since f p q (x)dfi(x) = 1, then by Leibniz' rule we get that f p q (x)d/j,(x) 

I ^da^ ' dp(x) = 0. Finally, the right hand side of (17) is obtained by using (20) and 



dq 

the fact that 



dp q (x) dlogMg 
dq dq 



dfi(x) 



dlogAfg f dp(x 



dq 



dq 



-d/j,(x) = 0. 



□ 



As a simple consequence, we can now check that rj = D(p q \\po) is indeed a 
monotone increasing function of q, as announced in section [2] 



Corollary 10. Let p q be a generalized escort distribution, with q > 0, and as- 

< oo for r in a compact neighborhood of q. Then ij(q) 



sume that E r 



log 



D(p q \\po) is a strictly monotone increasing function of q, with 

d 



dq 



(22) 



Proof Note that 77(g) = f p q (x) log ^jd^(x) = q f p q (x) log |^d/i(x) -logM g . 
Under the second order moment condition, one can differentiate under the integral 



sign, take into account ( 19 ) and it remains 
d 



dq 



dPq(x) Pl(x) 



dq 



' Po{x) 



where we recognize the Fisher information in (17). Therefore, taking into account 
the fact that both q and the Fisher information are positive, we get (22). □ 



Finally, an important result is that the integral of the Fisher information, the 
"energy" of the curve, is nothing but the Jeffreys divergence. This result is mentioned 
in |30j . Alternatively, this can also be obtained as a consequence of the general 
integral representation of the Kullback-Leibler divergence |11[ eq. 3.71]. We propose 
here a direct proof of the result. 

Theorem 11. Let p r and p s be two generalized escort distributions. Assume that 
21 



E n 



(log 



< 00 for all q € [r, s] . Then, the integral of the Fisher information 
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along the escort-path, from q = r to q = s is proportional to Jeffreys' divergence 
between p r and p s : 

(s-r) f I(q)dq = J(p s ,Pr)= D{p s \\p r )+D{p r \\p s ). (23) 

J r 

With r = and s = 1, we get the integral along the whole path connecting po and p±, 
that is 

-l 

I(q)dq = J( Pl ,p ) = D( Pl \\p )+D(p \\ Pl ). (24) 



o 



Proof. The Fisher information is finite on the escort path; therefore its integral over 
a compact interval is also finite. Let us integrate the right equality in (17): 

f I(q)dq = f f %M log Mx) dq . ( 2 5) 

Jr Jr J dq po(x) 

Since I(q) is positive and J* s I(q)dq finite, it is possible to apply Fubini's theorem to 
the right hand side of (25), and exchange the order of integrations. Thus, integrating 
with respect to q yields 

f I{q)dq= [ (p s ( x )-p r (x))log^\dn(x). (26) 

Jr J P0{X) 

On the other hand, the divergence D(p s \\p r ) writes 

/P\ (xj 
p s (x) log -—- dn{x) - log M s + log M r , 
Po{x) 



and similarly for D(p T \\p s ). Adding the two divergences and taking into account (26) 



give the result (23). □ 



Finally, let 9i, % = 1..M denote a set of intensive variables, which are some 
functions of the index q. Then, we have that dl ^ p = X]f=i ^W^^q anc ^ the Fisher 
information with respect to q can be expressed as 



1=1 J= 

where 1(9) is the Fisher information matrix with respect to 9. Therefore, for the 
escort-path we introduced, we obtain that the thermodynamic divergence is nothing 
but the Jeffreys divergence: 

M M rl AO AQ 

m]i 'i d^ dq = D(PlllP0) + D (P°^- W 



„1 M M „ 

J= / I( q )dq = Y,E 

Jo i=1 , =1 Jo 
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4. Inference of a distribution subject to q-moments constraints 



In the last section of this paper, we investigate some relationships between escort- 
distributions, information divergences, Fisher information and generalized Gaussians. 
Let us return to the model of states transition as presented in section [2] that led us 
to the generalized escort distribution ([8| as the optimum intermediate between po 
and pi . 

Assume that the distribution p\ is not exactly known but that the available 
information is given as an expectation under the escort p q . This expectation is the 
so-called generalized expectation, or g-average which is largely used in nonextensive 
statistics, although it is generalized here with the presence of p$. In our context, it 
has the clear meaning of an expectation with respect to the intermediate distribution 
p q at a given distance of a reference po, c.f Theorem [5j or with a given entropy, c.f 
Corollary [6] Let the observable be given as the absolute moment of order a: 

m a q \px\ - hj q [ x - — - — — — . (28) 

J pi{x)ipo{x) 1 idfj,{x) 

Typically, the observable could be a mean energy, where the statistical mean is 
taken with respect to the escort distribution. Then, the question that arises is the 
determination of a general distribution p\ compatible with this constraint. 

One may keep the idea of minimizing the divergence to p\, as in the original 
problem ([7]) which led us to the generalized escort distribution. Since the Kullback 
divergence is a directed divergence, we shall keep the notion of direction by minimiz- 
ing D(pg\\pi) for q < 1 and D(pi\\p q ) for q > 1. In both cases, the divergence is an 



affine function of the Renyi divergence D q (px\\po), c.f. (15). Therefore, these mini- 
mizations are finally equivalent to the minimization of the Renyi divergence under 
the generalized mean constraint. 

In the same vein, we may consider the minimization of the symmetric Jeffreys' 



divergence between p q and p\. We have noticed (16) that this divergence is also an 
affine function of the Renyi divergence D q (pi\\po). Therefore, its minimization is also 
equivalent to the minimization of the Renyi divergence under the generalized mean 
constraint. 

Finally, a natural idea is to select the distribution p, thus its escort p q , so as to 
minimize the thermodynamic divergence J 1 1(t)dt or I(t)dt from p q to p, while 



satisfying the constraint (28). We have seen that Jeffreys' divergence J(pi,p q ) is 



proportional to the thermodynamic divergence, as indicated in (23). As a conse- 
quence, the minimization of the thermodynamic divergence between p q and p\ is 
also equivalent to the minimization of the Renyi information divergence D q (pi\\po). 

It is known |6| that the maximization of Renyi entropy subject to generalized q- 
moments constraints, or equivalently of Tsallis entropy under the same constraints, 
leads to generalized Gaussian distributions. As far as the minimization of the Renyi 
information divergence is concerned, a direct proof based on a simple inequality can 
be derived along the lines in |31} Appendix 1] or in |32[ Proposition 4]. Therefore, 
we have the following result. 
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Proposition 12. Among all distributions with a given q-moment of order a as 



in (28), the distribution p with minimum thermodynamic divergence, or equivalently 
which minimizes Jeffreys ' or Renyi divergence to its escort, is a generalized Gaussian 
distribution given by 



±- (1 - (1 - q)l\x\ a yr* po(x) forq^l 



^yexp(-7|x| Q )p (2;) forq=l, 

where we use the notation (x), = max{x,0} , and where Z q {^f) is the normalization 
factor. 

When po is uniform the distribution becomes the standard Gaussian distribution, 
for a = 2, in the limit case q = 1, by l'Hopital's rule. This gives the rationale for 
the denomination of "generalized Gaussians". For q < 1, the probability density has 
a compact support, while for q > 1, the probability density has heavy tails with 
a power-law behavior and is analog to a Student distribution. These generalized 
Gaussians appear in statistical physics, where they are the maximum entropy distri- 
butions of the nonextensive thermostatistics [6]. In this context, these distributions 
have been observed to present a significant agreement with experimental data, and 
also to be the analytical solution of actual physical problems |33| I34| . |35j . In an 
other field, the generalized Gaussians are the one dimensional instances of explicit 
extremal functions of Sobolev, log-Sobolev or Gagliardo-Nirenberg inequalities on 
R n , with n > 2 [36]. 

Finally, let us close this paper with the example of a g-variance constraint, i.e. 
m 2,g\p] = o~q, with po(x) = 1 an d \i the Lebesgue measure. We have seen that among 
all distributions with a given differential entropy, the standard escort distribution 
p q minimizes the Kullback-Leibler divergence to p, for some value of the index q 
(Proposition [6| . If p is free but its escort is known to have a given variance, then the 
distribution p which minimizes the thermodynamic divergence or Jeffreys' divergence 



(Proposition 12 ), or equivalently that maximizes the Renyi entropy, is the generalized 
Gaussian (29) with a = 2. In this setting, p q is located at the intersection of the 
set of distributions with a given variance and of the set of distributions with a given 
Shannon differential entropy. When q varies, the optimum distributions follow a path 
indexed by q which is nothing but the path followed by the generalized Gaussians, 
with compact support for q < 1 and infinite support for q > 1. In the limit case q = 1, 
we obtain a standard Gaussian distribution, which is its own escort distribution, and 
that has the maximum entropy among all escort distributions with the same variance. 
These situations are illustrated in Figure|2] 

5. Conclusions 

In this paper, we have presented a simple probabilistic model of transition be- 
tween two states, which leads naturally to a generalized escort distribution. This 
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Figure 2: Path of distributions with maximum Renyi entropy and fixed q- variance. 
For each value of q, the optimum distribution p whose escort p q has a given variance 
is a generalized Gaussian. Thus, when q varies, the path followed by p is the manifold 
of generalized Gaussians with index q. 

generalized escort distribution enables to describe a path, the escort-path, that con- 
nects the two states. Then, we have connected several information measures, and 
studied their evolution along the escort-path. In particular, we have obtained that 
the Renyi information divergence appears naturally as a characterization of the tran- 
sition, and that the notion of escort mean values, as used in nonextensive thermo- 
statistics, receives a clear interpretation. We have studied the properties and the 
evolution of Fisher information along the escort-path. In particular, we have shown 
that the thermodynamic divergence on the escort-path is a simple function of Jef- 
freys divergence. We have also considered the problem of inferring a distribution 
on the escort-path, subject to a moment constraint on its escort. Looking for the 
distribution as the minimizer of the thermodynamic divergence, we have shown that 
this procedure is equivalent to the minimization of Renyi divergence subject to a 
(/-moment constraint, which gives a rationale for this approach. Finally, we have 
recalled that generalized Gaussian distributions arise as solutions of the previous 
problem. 

Beyond the intrinsic interest of our geometric construction, which enables to 
connect several quantities of information theory, we have also pointed out possible 
connections with finite thermostatistics. Furthermore, we have indicated that our 
findings interrelates several ingredients of the nonextensive statistics. Let us also 
add that the literature usually points out that the standard entropy (or divergence) 
is a particular case of generalized Renyi or Tsallis entropies. Our setting suggests 
a possible additional layer where the generalized quantities are derived from a con- 
struction involving the classical information measures. Therefore, we believe that 
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the presented construction, and the series of observations we made can be useful 
to workers in this field. Future work should consider the extension of this setting 
in the multivariate case. In future work, we plan to look for possible connections 
with finite time thermodynamics. We also intend to study the information theo- 
retic relationships between generalized moments, Fisher information and generalized 
Gaussians. 
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